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Abstract 


This review paper is intended for scholars with different backgrounds, possibly in only one of the subjects 
covered, and therefore little background knowledge is assumed. 

The first part is an introduction to classical and quantum information theory (CIT, QIT): basic definitions 
and tools of CIT are introduced, such as the information content of a random variable, the typical set, and some 
principles of data compression. Some concepts and results of QIT are then introduced, such as the qubit, the pure 
and mixed states, the Holevo theorem, the no-cloning theorem, and the quantum complementarity. 

In the second part, two applications of QIT to open problems in theoretical physics are discussed. 

The black hole (BH) information paradox is related to the phenomenon of the Hawking radiation (HR). Consid¬ 
ering a BH starting in a pure state, after its complete evaporation only the Hawking radiation will remain, which 
is shown to be in a mixed state. This either describes a non-unitary evolution of an isolated system, contradicting 
the evolution postulate of quantum mechanics and violating the no-cloning theorem, or it implies that the initial 
information content can escape the BH, therefore contradicting general relativity. The progress toward the solution 
of the paradox is discussed. 

The renormalization group (RG) aims at the extraction of the macroscopic description of a physical system from 
its microscopic description. This passage from microscopic to macroscopic can be described in terms of several steps 
from one scale to another, and is therefore formalized as the action of a group. The c-theorem proves the existence, 
under certain conditions, of a function which is monotonically decreasing along the group transformations. This 
result suggests an interpretation of this function as entropy, and its use to study the information flow along the 
RG transformations. 
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1 Classical information theory 

Classical information theory has been introduced by 
Claude Shannon in 1948 HUS- In this seminal work 
he has devised a quantitative definition of information 
content, and then other formal definitions of relevant 
quantities, in order to allow for a quantitative treat¬ 
ment of those and other related subjects. In the same 
seminal work he also demonstrated some important 
theorems which hold for such quantities. In this first 
section we give a summary of the main concepts of the 
classical information theory introduced by Shannon. 


1.1 Information content 

The first important contribution of Shannon has been 
to address the question: “What is information?”. More 
precisely, he was looking for a way to measure the 
amount of information contained in a given physical 
system. This is a rather elusive concept, and it can 
depend on things difficult to quantify, things such as 
the context, and the observer background knowledge. 

To give an example, we can think at the amount 
of information contained in human facial expressions. 
We know at an intuitive level that a big amount of in¬ 
formation is contained in a single facial expression (see 
figure [l]), since we sometimes take important decisions 
based on such informations. But at the same intu¬ 
itive level we can appreciate how difficult is to quantify 
this amount. Moreover, the type of information in the 
example of the facial expressions refers to emotional 
states or states of consciousness , and therefore involve 
some degree of subjectivity in their definition (think 
e.g. at the famous painting “Mona Lisa” by Leonardo 
da Vinci, and its enigmatic facial expression, so diffi¬ 
cult to define). As usual in science, Shannon has over¬ 
come this type of difficulty by first defining clearly the 
scope of his definition. His definition of “content of 
information” is indeed limited to systems that can be 
described by a random variable. 

Since we need a precise definition of random vari¬ 
able, following the notation of MacKay E we will use 



Figure 1: Examples of facial expressions. 


the concept of ensemble , i.e. the collection of three ob¬ 
jects: 

X = {x,A x ,Vx) (1) 

where x represents the value of the random variable, 
Ax is the set of the possible values it can assume, and 
Vx is its probability distribution of those values (i.e. 
the set of the probabilities of each possible value). 

1.1.1 Information content of a single outcome 

Based on this concept we then introduce the following 
definition for the amount of information gained from 
the knowledge of a single outcome Xi £ Ax of the 
random variable A': 


h(xi) 


1 

log 2 


log 


1 

p{xi) 


( 2 ) 


where p(xi) £ Vx is the probability of the outcome 
Xi . To give an intuition of this definition we can con¬ 
sider the example of the weather forecast. Let’s sim¬ 
plify, and consider a situation where two only possi¬ 
ble weather conditions are possible: sunny ("ft - ) and 
rainy (S?). So, in our example the random variable 
is “tomorrow’s weather”, the two possible values are 
Ax = {■$■, S?}, and there will be a probability distri¬ 
bution Vx = {p(^), pffi)}- 

It is worth noting that the definition of Shannon is 
totally independent from the actual value of the out¬ 
come, and only depends on its probability. It is in 
order to stress this concept that we have used the sym¬ 
bols {■$■, 0?} for the values of the outcome, that are 
not numerical, and do not appear at all in ([2|. It is 
also worth to stress that this definition of “amount of 
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information contained in a single outcome” is a differ¬ 
ential definition: the difference between the amount 
of information we possess about the random variable, 
before and after we know the outcome. 

We can illustrate this concept of “differential defini¬ 
tion” using the weather variable: in a location where 
there is a very high probability of sunny weather, 
with the probability distribution Vx = {p(^) = 
0.99, = 0.01}, if tomorrow we see sunny weather, 

we will have learnt very little information. On the 
other hand, if tomorrow we find rainy weather, we will 
have gained a lot of useful information, with respect 
to today. 

1.1.2 Information content of a random vari¬ 
able 

Using the definition (j2j of the information content of a 
single outcome, we can define the information content 
of a whole random variable: 

H(X) = y ^p(xj)h(xj) 

\ v , ,. i < 3 > 

This definition can be seen as the average of the in¬ 
formation gained for each outcome expressed in 
averaged over all the possible outcomes. 

This expression is formally equal (apart from con¬ 
stant factors) to the entropy defined in thermodynam¬ 
ics, and Shannon proposed the same name in the con¬ 
text of information theory. This entropy is sometimes 
called “Shannon entropy”, to distingush it from its 
quantum counterpart, discussed in the following. In 
the case of a binary variable (i.e. variable with only 
two possible outcomes) we have: 



Figure 2: Plot of the entropy of a binary variable 
(binary entropy) shown in <©• 


although the information content of the very unlikely 
outcome hffi) = log is very high, its weight 
(i.e. probability) in the average ([5]) is very small. So 
we have that the highest value for the binary entropy is 
for the uniform probability distribution Vx = {p(^c) = 
0.5, p(Q?) = 0.5}, so that p = 1/2 and all the outcomes 
are equiprobable. It can be shown that this is true not 
only for the case of a binary variable, but for all the en¬ 
tropies of any random variable. This also explains the 
constant factor in the definitions of the entropies: 
it is a normalization factor, so that the maximum en¬ 
tropy is normalized to 1. The factor has also the 
advantage to make the definitions ^ and ([5]) inde¬ 
pendent of the choice of the basis for the logarithms. 
Alternative and equivalent definitions are: 

H =-'Y^p{x i )\og 2 p{x i ) (6a) 

i 

H( 2 ) = -plog 2 p- (l-p)log 2 (l -p). (6b) 


Ax = {0,1} (4a) 

Vx = {p, (1 ~p)}, (4b) 


and the entropy of a binary random variable gets the 
special name of binary entropy. 


H {2) = 


log 2 


P log 1 + (1 - p) log 1 
P (1 ~P) 


(5) 


With this normalization is said that the entropy is 
measured in bits, and the entropy of an unbiased bi¬ 
nary variable is 1. Sometimes another normalization 
is used, where the log 2 is replaced by the natural loga¬ 
rithm In = log e ; in this case it is said that the entropy 
is measured in nats. 

1.1.3 comments 


A plot of the binary entropy as a function of p is shown 
in figure [2] 

Again as for the information content of a single out¬ 
come, we can give some intuition for the definition 
of the entropy (i.e. information content) of a random 
variable using the example of the weather forecast. We 
can notice that in the case of a very biased probabil¬ 
ity distribution Vx = {p(^) = 0.01, p(imi) = 0.99}, 


We can find an intuitive justification of the definition 
([2| doing the following observations. First, the prob¬ 
ability of two independent variables is the product 
of the probabilities of each outcome. On the other 
hand, for the definition © of “information from a 
single outcome” it is reasonable that the information 
gained from two outcomes from two independent vari¬ 
ables is the sum of the information gained from each 
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outcome. Thirdly, we have emphasized that the infor¬ 
mation content only depends on the probability. Given 
all this, when looking for an expression of the infor¬ 
mation content, the logarithm of the probability fits 
all the requirements. The last detail of using the loga¬ 
rithm of the inverse of the probability is coming from 
the requirement that the entropy of a variable has to 
be maximal (and not minimal) in the case of uniform 
probability distribution (see figure [2j. 

1.2 Other important definitions 

For the applications we want to introduce in the fol¬ 
lowing sections, we need to define few more quantities. 
The definitions we need involve two random variables: 


{X,Ax,Vx} (7a) 

{Y,A y ,Vy} (7b) 

1.2.1 Joint entropy 

The joint probability p(x, y) is defined as the proba¬ 
bility that the variable X has the outcome x and the 
variable Y has the outcome y. Based on this concept, 
it is easy to define the joint entropy of two random 
variables as: 

H(x ' y » s ki< 8 > 

It is worth to recall from probability theory that the 
joint probability is the product of the probabilities in 
the case of independent random variables. So in the 
case of independent variables the joint entropy is the 
sum of the entropies. 

Complementary to the concept of joint entropy is 
the definition of mutual information of two random 
variables: 

I(X : Y) = H(X) + H{Y) - H(X, Y). (9) 

We can use the intuition that mutual information is a 
measure of how much two random variables are not in¬ 
dependent. It is also useful to rephrase this and think 
that mutual information is a measure of how much we 
know about a random variable X if we know about ran¬ 
dom variable Y. It is frequently used a graphical rep¬ 
resentation to visualize the relationship between en¬ 
tropy, joint entropy and mutual information. Instead 
of the Venn diagrams 00 , sometimes misleading, we 
prefer to use the alternative approach used e.g. by 0, 
shown in figure [3] 


I | H(X. Y) 

H(X) I I 

I I H(Y) 

I(x : Y) | | 

Figure 3: A graphical representation of the relation¬ 
ship between entropy, joint entropy and mutual infor¬ 
mation. 


1.3 Source coding theorem 

After having introduced some definitions, we here de¬ 
scribe a theorem, called source coding theorem. 

First, we have to introduce the notion of a source, 
described as a black box producing sequences of val¬ 
ues. The way to model this is to consider those values 
as the outcomes of random variables. So we consider 
a sequence of N random variables, and we assume the 
following hypotheses: that the variables are indepen¬ 
dent from each other, that the set of possible values 
is identical for alle the variables, and finally that the 
probability distributions are identical. This is usually 
summarized as the N variables being independent and 
identically distributed, or i.i.d.. 

1.3.1 Typical set 

Let’s consider a sequence of N i.i.d. binary vari¬ 
ables. We can write the sequence of variables as 
{X 1 ,X 2 ,..., Xn) = X N , and a single outcome will be 
a sequence of values as (x\, x^,..., Xn) = % N , which 
in the case of a binary variable can be represented as 
a sequence of N ones and zeroes. We can call A \« 
the set of all the possible sequences, and we can write 
it down, (e.g. using the lexicographic order) as follows: 

( 0 , 0 , 0 , 0 , 0 ,..., 0 ) 

( 1 , 0 , 0 , 0 , 0 ,..., 0 ) 

( 0 , 1 , 0 , 0 , 0 ,... , 0 ) (10) 
( 1 , 1 , 1 , 1 , 1 , •■•,!) 

Given all this, the source coding theorem proves the 
existence of a subset of Ax «, called typical set, with 
the property that "almost all" the information con¬ 
tained in the random variable is indeed contained in 
this subset. Moreover, the theorem proves that for a 
sequence of N i.i.d. variables with entropy H(X), the 
typical set has 2 NH<Xl elements in it. To be more 
precise, the theorem can be verbally stated as follows: 
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Theorem 1.1 (Source coding theorem) N i.i.d. 

random variables each with entropy H(X) can be com¬ 
pressed into more than 2 NH ( X ' > bits with negligible risk 
of information loss, as N —► oo; conversely if they are 
compressed into fewer than NH(X) bits it is “virtually 
certain” that some information will be lost. 


It is of course possible to have a more precise state¬ 
ment, where instead of the “almost all” and “virtually 
certain” phrases, the proper mathematical expressions, 
with “the epsilons and the deltas” typical of the math¬ 
ematical limits are used. For a proof of the theorem 
see e.g. [3JE:. 


1.3.2 Compression 

In figure [4]we can see a graphical representation of the 
typical set, along with the idea that it is possible to 
label the elements of the typical set. The fundamental 
idea of compression is that if we use only the NH(X) 
symbols needed to label the elements of the typical set, 
instead of using the N symbols of the full sequences, 
we have a negligible probability to loose information. 


2 Quantum Information Theory 


If the physical system used as support for the trans¬ 
mission and processing of information is a quantum 
system, classical information theory is no more valid 
in all its parts, and a different theory has to be de¬ 
veloped: quantum information theory (QIT). As the 
classical random variable with two possible values (the 
bit) is the building block of CIT, the quantum ran¬ 
dom variable with its possible described by vectors of 
an Hilbert space of dimension two (the qubit) is the 
building block of QIT (see figure [s]) . The experimen¬ 
tal efforts to implement a qubit in a physical system 
have already a long history. Among the different ap¬ 
proaches we can mention ion traps mis], quantum dots 
mm, nuclear spins, accessed via nuclear magnetic 
resonance mm, colour defects in crystals m o 
and superconductive structures USUIS]. In this sec¬ 
tion we will review the usual axiomatic introduction of 
quantum mechanics (QM) and the formal tools which 
are necessary to describe the applications of QIT pre¬ 
sented in the following. Among the many references 
for the axiomatic introduction to quantum mechanics, 
and the statements of its postulates, we refer mostly 
to [[17]. 


( 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,..., 0 ) 

( 0 , 0 , 1 , 1 , 0 , 0 , 0 , 1 , 0 , 1 ,..., 0 *==^ 1 
( 0 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 1 ,..., i$=^4 2 
( 1 , 0 , 0 , 1 , 0 , 0 , 1 , 0 , 0 , 1 ,..., 0 )-—^ 3 

(0,0, 1, 0, 1, 1, 0, 1, 1, 0,..., 0) ^v,n, 

( 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ,..., 1 ) 

(a) 


( 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,..., 0 ) 


(0,0,1,1,0,0,0,1,0,1,..., 0i)zrr>-o ooooo-oi 


(0,1,0,1,0,0,0,0,0,1,..., ooooo -10 

(1,0,0,1,0,0,1,0,0, 1 ,..., 0)^=x> ooooo--- li 


(0,0,1,0,1,1,0,1,1,0,..., 0)—~_ nm...ii 


( 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ,..., 1 ) 

(b) 


N H(X ) 


Figure 4: The typical set as a subset of all the possible 
sequences of N i.i.d. random variables outcomes, (a) 
The typical set elements can be labeled with a num¬ 
ber between 1 and 2 NH ( X \ (b) This number can be 
written with NH(X) binary simbols. 
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seen as a dot product between matrices: 


-) 



Figure 5: The Block sphere is a two dimensional man¬ 
ifold, and is used to represent the two dimensional 
Hilbert space of the states of a qubit. 


2.1 Mixed states and density operator 
formalism 

The state of a quantum system is represented by 
an element of an Hilbert space H, of modulus one, 
which in the Dirac notation can be represented by 
a “ket” | i/j) £ H. In the case of a qubit (i.e. two- 
dimensional system) the basis can be represented as 
{10), 11)} (called computational basis), and the generic 
state of the qubit will be | 0) = a |0) + /3 11), where 
a, fi £ C, and the link to the angles shown in figure |5b| 
is |0) = cos | |0) + e zv> sin % |1). 

In analogy to the concept of random variable in¬ 
troduced above, we need a formal tool to describe a 
situation where the state of the quantum system is un¬ 
known, and it is only know the set of possible states, 
with their probability distribution. If a system is in 
such conditions, it is said to be in a mixed state, and 
the tool to describe mathematically a mixed state is 
the density operator. 


2.1.1 Density operator of a pure state 


To introduce the density operator, let’s first recall 
some details on linear algebra. The scalar product 
in the Dirac notation is written as (0|0); if we choose 
a basis {11) , |2),..., \n ),...} of the Hilbert space, it 
is possible to compute the components (i|0) = ipi 
and ((f>\i) = 0* of the vectors and co-vectors, so to 
write them as one-column and one-row matrices re¬ 
spectively. In this notation, the scalar product can be 


(010) = (01,02,---,0n,---) 

= 


( 0i \ 

02 


\ 0n / 


(11a) 

(lib) 


But if we invert the order, and write 



V’i N 
02 

(01 ,02, • • 

• ; • • •) 

0r> / 



•0101 

V’i 02 

••• 010AT 

i>2 <t>* 

0202 

• • • 020^ 

0jv0 

0X02 

• ' • 0JV0JV 


\ 

/ 


(12a) 


(12b) 


we have a matrix, which can be interpreted as the rep¬ 
resentation, in the chosen basis, of an operator defined 
on the same Hilbert space. 

This was written for two different states |0) and |0). 
But using this type of product we can associate to any 
single vector of the Hilbert space an operator: 


10) O 


f 

o \ i >) (01 = 


V’i 

02 


\ 

(01,02,---,0n,---) 


V Vw 


0101 

0102 ' 

• 01 4>*N 

\ 

0201 

020| ■ 

■ 020W 


0JV01 

0iV02 ' 

• 0JV01V 

1 


def ^ 

= Pip • 


(13a) 

(13b) 

(13c) 


2.1.2 Density operator of a mixed state 


When a state of a quantum system can be represented 
as a vector of an Hilbert space (i.e. a ket in Dirac no¬ 
tation) , it is said to be in a pure state. But if we want 
to represent the quantum analog of a random variable, 
we have to use the concept of mixed state introduced 
above, where we don’t know the state of the system, 
but only a set of possible states, and their respective 
probabilities. A mixed state for which all its possi¬ 
ble states are equiprobable is said a maximally mixed 
state. It is interesting to point out that whether the 
system is in a pure or a mixed state depends on both 
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the system and the observer, because the knowledge 
about the system depends also on the observer, and 
not only on the system itself. The density operators 
formalism is able to effectively represent this type of 
states. 

Indeed, if the possible states of the sys¬ 
tem are {|ori}, |ot 2 ) , - ■ -» |ckjv)}, with probabilities 
{PhP 2 , ■ ■ ■ ,Pn}, then the mixed state can be repre¬ 
sented as: 

^p i |a i )(a i |. (14) 

7=1 

This can be seen as a linear combination of the den¬ 
sity operators associated to the pure states, where the 
coefficients are the probabilities. 

This is an abstract representation of the density op¬ 
erators; if we fix a basis in the Hilbert space, we can 
write a density operator as a matrix, that will be called 
density matix. A special and not uncommon case is 
when the set of possible states of a mixed state is an 
orthonormal basis for the Hilbert space. If we write 
this orthonormal basis as {11) , |2) ,..., \n) ,...}, and 
then represent the density matrix associated to a pure 
state in this basis, the matrix elements will be all zero, 
apart from one single element on the diagonal equal to 
one, in the position corresponding to the position of 
the pure state in the basis: 



( ° ^ 


n) (n\ = 


(0, 


n 



\ '■ ) 


/ 0 0 

0 0 \ 

o 

0 0 

0 0 

1 0 

\ 0 0 

0 

•• ) 


(15) 


If we then consider a mixed state such that the possible 
states are all the elements of the basis: 


^2Pi l*)(*l ( 16 ) 


its density matrix, represented in this same basis will 
be diagonal, with the probabilities as diagonal ele¬ 
ments: 


f Pi 0 0 0 \ 

0 0 0 

0 0 p n 0 

0 0 0 / 


(17) 


If represented in this basis, non-zero off-diagonal el¬ 
ements indicate that some of the possible states are 


quantum superpositions of basis states. From the nor¬ 
malization property of the probability distribution it 
is then easy to see that: 

Tr(p)=$> = l, ( 18 ) 

i 

where Tr{p) indicates the trace, defined as the sum 
of the diagonal elements. Since the trace is pre¬ 
served under change of reference, we can conclude that 
Tr{p) = 1 is a property of any density matrix. An¬ 
other property of any density matrix is that the eigen¬ 
values are non-negative. This can be proven rigor¬ 
ously, and can be easily seen in the case of a diagonal 
density matrix where the eigenvalues have the 

meaning of probabilities. 


2.2 Quantum measurement and quan¬ 
tum complementarity 

Continuing with the axiomatic introduction of quan¬ 
tum mechanics, after the concept of mixed states, and 
the density operators formalism to describe them, we 
now describe the measurement of the state of a quan¬ 
tum system. 

In the following subsections we will give two possible 
formalizations of the measurement process, namely the 
projective measurement , and the POVM. Finally, we 
will see the concept of complementarity. 


2.2.1 Projective measurement 

A first way to formalize the measurement process is 
the projective measurement or von Neumann measure¬ 
ment (see [IT]DEED- In this description we associate to 
the measurement an hermitian operator M, and its 
decomposition over the projectors on its eigenspaces: 

M = J2 mP m (19) 

m 


where {m}, the eigenvalues of M, are the possible out¬ 
comes of the measurement, and the {P m } operators 
are projectors, i.e. satisfy the following properties: 

Vm, P m is hermitian (20a) 

V?Tl, 777- , PmPm' = ^m,m' Pm- (20b) 


The probability that the outcome of the measurement 
is m when the system is in the state | ip) is: 

Pipirn) = (4’\P m \f) ; (21) 


and soon after such measurement the state of the sys¬ 
tem is: 


Pm \P) 

\PtPprn) 


( 22 ) 


7 










From the requirement that the sum of all the prob¬ 
abilities (211 is equal to 1 we have the property of 
completeness for the set of projectors: 


1 (23) 


The expectation value of the measurement M if the 
system is in the state | ip) is: 


E$(M) = p$(m) 

m 

= ^m(V>|-P ro I 

m 

= W ( mP " 

\ m 

= WM\^} 

= <M>*. 


(24) 


and the standard deviation is: 


A(M) = yJ{{M - {M)^ 
= yj(M*)* - (M)l 


(25) 


where we have used the compact notation (if\ ■ \ip) = 
(•)■!/,■ Sometimes it is useful to write the projectors as: 

P m = MlM m (26) 


where M rn are called Krauss operators. The equations 
(19 (-(231 can be rewritten in terms of the Krauss op¬ 
erators using (26 ). 


2.2.2 POVMs 


It is possible to generalize the projective measurement 
and define the POVM (positive operator-valued mea¬ 
surement PS). where some of the hypotheses of the 
projective measurement are relaxed. In particular, we 
consider the collection of operators that represent the 
measurement: 

{E m } (27) 


and relax the hypothesis that those operators are pro¬ 
jectors. Similarly to the projective measurement, the 
probability that the outcome is m if the system is in 
\ip) is: 

Pili(m) = (ip\ E m \ip). (28) 


Also for the POVM we have the property of complete¬ 
ness: Y2 m ~ but 85 a consequence of the (271 not 
being projectors, is that in general we can not write 
them in terms of the Krauss operators, as in (261, and 


therefore for the POVM measurement it is not defined 
the state of the system after the measurement. 

A common situation with POVM measurement is 
when we have a quantum system in a mixed state, 
where the set of possible states are represented by 
some vectors of the Hilbert space {|not nec¬ 
essarily orthogonal to each other, and we want a mea¬ 
surement in order to know in which of the states of the 
set the system is. This POVM is represented by the 
set of operators: 

{E m = {'(pm) (ip m \}. ( 29 ) 

These last operators are indeed projectors; however, 
since the {|V’m)} are not necessarily orthogonal, this 
POVM is not in general a projective measurement. 
In this type of POVM, since the set of states does 
not necessarily form a basis of the Hilbert space, the 
completeness property has in general to be guaranteed 
with suitable normalization coefficients. 

2.2.3 Quantum complementarity 

If we consider the Hilbert space representing the states 
of the quantum system, each basis can be seen as a dif¬ 
ferent POVM. In particular, an orthogonal basis will 
correspond to a projective measurement. The prepa¬ 
ration and measurement of the quantum state of a 
physical system can be described in the language of 
QIT in terms of the encoding and decoding of infor¬ 
mation by two parties, traditionally called Alice and 
Bob. The quantum complementarity is then related to 
the choice of the basis in which each party operates. If 
we consider the example of a qubit, in figure [6] two dif¬ 
ferent orthogonal bases are shown, the computational 
basis {|0) , 11)}, and the basis {|+), |—)}, where 

!+) = >) +ID) (30a) 

|-) = i=(|0>-|l». (30b) 

Alice may choose to encode some information in the 
qubit, using the computational basis {10) , 11)}, i.e. she 
prepares the system in one of the two states of this 
basis (see figures [5] and [6| . The qubit will be then 
transmitted to Bob, who will perform a measurement 
to decode the information. If he chooses the diagonal 
basis {|+), |—)}, he will be in the situation where both 
outcomes of the measurement have 0.5 probability (see 
figure |6b[ ). To describe this situation in terms of infor¬ 
mation we can use the concept of mutual information 
expressed in and say that the mutual information 
between the (classical) random variable representing 
Bob’s measurement outcome and the (classical) ran¬ 
dom variable representing the information encoded by 











Alee, is zero. This means in other terms that the Bob 
can not access the information of Alice. This situation 
expresses the concept of quantum complementarity, 
and based on this concept Charles Bennett and Gilles 
Brassard in 1984 devised the idea of quantum cryptog¬ 
raphy m, which over the years has become one of the 
most developed applications of QIT [13 ED El [231 El]. 



Figure 6: Two orthogonal references in the plane, 
to represent two different projective measurements: 
the computational basis {|0), 11}}, and the basis 
{|+), |—)} defined in (301. (a) A generic vector, with 
its components on the two references, (b) A special 
case of an eigenvector of the first reference which has 
equal components on the second reference. 


2.3 von Neumann entropy 

In analogy to the definition of information content of 
a classical random variable (Shannon entropy) defined 
in (|3j, it is possible to define the von Newmann en¬ 
tropy, in the case of a quantum random variable, in 
the following way: 

S( ' !, “toi2 T ’'(' il0g ^)- (31) 

Here TV(-) represents the trace of the density matrices, 
and p is the density operator representing the random 
variabile of which S represents the (quantum) infor¬ 
mation content. 

2.3.1 Quantum evolution 

To complete the axiomatic framework of quantum me¬ 
chanics we need one last postulate, about the evolution 
of a quantum system. It states that the evolution in 
time of a quantum system is described by an unitary 
transformation over the Hilbert space describing the 
states: 

m)) = u\m)- ( 32 ) 


Here we will not give the details about the actual uni¬ 
tary operator, described by Shrodinger equation. 


2.4 Holevo theorem (Holevo bound) 

One of the most important results of QIT is the fol¬ 
lowing theorem, called after Alexander Holevo [213. As 
for the description of quantum complementarity, this 
result is best described in terms of the interaction be¬ 
tween the two parties Alice and Bob. 


Theorem 2.1 (Holevo bound) Let’s suppose that 
Alice prepares the quantum system in a mixed state 
described by the density operator px, where X = 
{|a;i),..., \x n )} are the possible pure states, and 
{pi,... , Pn\ are the corresponding probabilities. Then, 
Bob performs a measurement, described by a POVM 
built (as described in section \2.2.2 ) on the set of pure 
states Y = {|yi),..., \y n )}> o,nd we denote y the out¬ 
come of this measurement. It is possible to prove that 
for any such measurements Bob may do there is an 
upper bound for the mutual information © between 
the two random variables X and Y. In particular: 


I(X : Y) < S(p) - 5>*S(/y (33) 

X 


where p = PxPx the density operator describing 
the global mixed state prepared by Alice. 

It is worth to stress that from the point of view of 
Alice (the sender), the information she encodes in the 
system is a classical information. We can represent 
it as the integer index labelling the states in the set 
of quantum states X = {|cci),..., \x n )} chosen for the 
encoding. On the other hand, from the point of view of 
Bob(the receiver), the system is in a quantum mixed 
state. The following theorem expresses the relation¬ 
ship between the information contained in those two 
random variables. 


Theorem 2.2 Given a classical random variable, en¬ 
coded in a quantum system using the set of pure states 
X = {|a;i) ,..., |aj„)}, the relation between the infor¬ 
mation contained in this classical random variable, 
and the quantum information contained a mixed quan¬ 
tum state px built with those pure states is: 

S(p)-^2p x S{p x )<H(X) (34) 

X 

the equality being reached in the case {|xi),..., |x n )} 
are all orthogonal vectors. 


Because of this second result 
Holevo theorem (331 


we can express the 
saying that in a quantum 


encoding-decoding process the amount of information 
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that Bob can access is in general less than the (clas¬ 
sical) information initially encoded by Alice, and that 
this information can be fully accessed only in the spe¬ 
cial case where the set of quantum states used for the 
encoding is orthogonal. 

2.5 No-cloning theorem 

Another important result of QIT is the no-cloning the¬ 
orem, introduced by and Wootters, Zurek and Dieks 
in 1982 |251 12Z1- It is a no-go theorem that can be 
stated very briefly as follows: 

Theorem 2.3 (No-cloning) It is impossible to cre¬ 
ate an identical copy of an arbitrary unknown quan¬ 
tum state. 

The crucial part is the fact that the theorem applies 
to a situation where the state is unknown. 

The theorem can be expressed also in the following 
alternative statement: 

Theorem 2.4 (No-cloning) Given two states 

, IV^)} £ H, which are non-orthogonal, i.e. 
0 < \{ipi\f> 2 ) |< 1, it doesn’t exist an unitary transfor¬ 
mation defined on two states U : H. ® H —> H. 
such that 

|0)) = |-0i) \tpi) (35) 

when i is not known, i.e. when ifi £ is un¬ 

known. 

3 The Black Hole Information 
Paradox 

3.1 Black holes 

For the purpose of this review, black holes (BHs) can 
be briefly described as objects so dense, and with a 
gravitational field so strong, that on a surface external 
to them, and called events horizon, the escape velocity 
is higher than the speed of light. This implies that no 
physical object, not even light itself, can ever leave a 
BH once it is inside its event horizon. 

3.2 Hawking radiation and black hole 
evaporation 

The work of Stephen Hawking in 1974 m introduced 
the notion of the Hawking radiation (HR). This phe¬ 
nomenon is in turn due to the phenomenon of quantum 
vacuum fluctuations, that was discussed and theorized 
at the beginning of the 20th century by the scientists 
that contributed to develop quantum theory (see e.g. 
ESIHl). Quantum vacuum fluctuations are in turn 


linked to what has been subsequently formalized as 
the Heisenberg uncertainty principle eh nz], and can 
be summarized as the continuous and very rapid cre¬ 
ation and annihilation of particle-antiparticle couples 
(see figure [7]) . Hawking theorized that there is a non- 

Event 



Space 

-► 


Figure 7: Schematics of the mechanism of quantum 
vacuum fluctuation and generation of Hawking radia¬ 
tion. 

zero probability that a particle-antiparticle couple is 
generated close enough to the BH’s event horizon, so 
that one of the two particles manages to escape before 
they re-annihilate while the other is trapped inside the 
horizon. The net effect is a radiation emitted from the 
BH while taking some energy from it, and because of 
the mass-energy equivalence, the phenomenon can be 
described as the evaporation of the BH. The Hawk¬ 
ing radiation has an extremely low intensity, but if 
the BH is small enough, it can lead to the complete 
evaporation of the BH in a physically meaningful time, 
compared to the age of the universe. In its subsequent 
detailed quanto-mechanic calculations j32|, [35] , Hawk¬ 
ing showed also that the quantum state in which the 
HR is emitted is a maximally mixed state (see section 

2X21. 

3.3 Black hole paradox 

Since it is always possible to prepare the BH, as soon as 
it forms, in a pure state, and then leave it isolated, the 
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phenomenon of HR leads to a contradiction. Indeed 
if we consider an isolated BH as an isolated quantum 
system, according to the postulates of QM seen in sec¬ 
tion [23d] its evolution should be described by an uni¬ 
tary transformation. But if we consider the process of 
complete evaporation of the BH, and take into account 
that the HR is emitted in a mixed state, we would have 
the evolution of an isolated quantum system from a 
pure state to a mixed state, in contradiction with that 
postulate. For what follows it is worth to remember 
that a maximally mixed state is such that each state 
of the mixture is equiprobable. So if we describe the 
final state of the Hawking radiation after the complete 
evaporation as a quantum random variable, this is in a 
maximally mixed state, and therefore it has zero mu¬ 
tual information with the quantum random variable 
describing the initial state. 

3.3.1 BH paradox in terms of QIT 

It is possible to rephrase this contradiction using the 
concepts of quantum information theory, so to show 
that contradicting the postulate of unitary evolution of 
an isolated quantum system is equivalent to contradict 
the no-cloning theorem introduced in section |2.5[ 
Let’s consider a physical system, containing a cer¬ 
tain amount of information, dropped into the BH at 
an early time, and let’s ask the question whether this 
information can in principle be retrieved at a later time 
or not (see figure [8] . 



Figure 8: Information falling into the event horizon: 
can it, even in principle, be retrieved? From the point 
of view of an in-falling observer, crossing the event 
horizon has no physical effect, and this suggests that 
also the information is not destroyed when it falls in¬ 
side the horizon. 

In a deterministic system, following the dynamic 


equations that describe its evolution, it is in princi¬ 
ple possible to reconstruct an earlier state once we 
fully know the state at a later time (with emphasis on 
the full knowledge of any degree of freedom and their 
correlations). So, if a BH is well described by quan¬ 
tum mechanics, the answer to the question about the 
information retrieval should be affirmative, and the 
Hawking radiation is a good candidate to explain how 
the information can escape. This in turn would ques¬ 
tion general relativity, from which the very definition 
of event horizon descends f5¥, 55], because by defini¬ 
tion nothing can escape the event horizon. 

If on the other hand the answer to the question about 
the information retrieval is negative, then it means 
that the quantum-mechanical description of the BH 
and its evolution has to be revised. 

Moreover, we can show how, if we accept the notion 
that somehow the information initially dropped inside 
the event horizon, eventually escapes via the Hawking 
radiation, we incur in another problem. Indeed, from 
the point of view of an in-falling observer, crossing the 
event horizon has no physical effect. So we can safely 
assume that the information dropped in the BH still 
exists intact, inside the event orizon (at least until it 
reaches the internal singularity of the BH). 

Therefore, if the information also escapes, it means 
that at least a finite time, two copies of the same infor¬ 
mation exist, inside and outside the event horizon. So 
this would contradict the no-cloning theorem of sec¬ 
tion 12.51 

3.3.2 Contributions to the solution from QIT 

Although the BH information paradox is still an open 
problem, QIT has contributed to its comprehension 
with some important results and insights. 

Jacob Bekenstein is one of the leading authors of 
such line of research |36| . In 1972 he has introduced a 
generalized second law describing the thermodynamics 
of BHs [57;, and in the 1973 he has introduced a defini¬ 
tion of BH’s entropy , as being proportional to its area 
A and inversely proportional to the square of Plank’s 
length i 2 p \ 

Sbh oc -p. (36) 

Then, at first Bekenstein [55], and then Bousso m 
have found upper bounds for the BH’s entropy. Since 
the double meaning of the entropy as both a thermo¬ 
dynamic parameter and a measure of the information 
content of a system (see section |1.1[ ) these results have 
suggested a information theoretical approach to solve 
the paradox. 

Hayden and Preskyll [4J3] have used results from 
quantum error correction, to extend a result already 
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found by Page m- When the BH is in an advanced 
stage of its evaporation, more precisely when its en¬ 
tropy is less than half the initial amount, they prove 
that the information retention time, i.e. the time 
needed for the information dropped in the event hori¬ 
zon to re-emerge in the Hawking radiation, is relatively 
short, and in particular: 


tinio = 0(r s \ogr s ) (37) 


where rs is the Schwarzschild radius. 


Another contribution to the solution of the BH in¬ 
formation paradox, also used by Hayden and Preskyll, 
is the concept of BH complementarity |421 35] . This 
approach considers two possibilities: the information 
traveling toward the BH from outside, when reaches 
the event horizon is either transmitted inside or re¬ 
flected outside. Then, the suggestion is that instead of 
choosing between those two possibilities, we can accept 
them both. To solve the conflict with the no-cloning 
theorem, we assume that, because of the quantum 
complementarity discussed in section |2.2.3| it is im¬ 
possible for any observer to observe both descriptions, 
or access both copies of the information. An external 
observer will see the incoming information being ab¬ 
sorbed by the event horizon, and then re-transmitted 
outside by means of the Hawking radiation, all this 
process being unitary. The observer falling inside the 
event horizon from outside will not notice the crossing, 
and will continue to observe the information that is 
falling with him. But he will not be able to access the 
information reflected outside with the Hawking radia¬ 
tion, because that will be encoded in a different basis, 
such that the mutual information is zero. 

Another important result worth to mention is the 
holographic principle , a general result which can be 
stated as follows: “Physical processes in a system of 
T> dimensions are reflected in processes taking place 
on the V — 1 dimensional boundary of that system. 
There is an equivalence between theories of different 
sorts written in space-times of different dimensions” 

mm- 

The fields of QIT, Astrophysics and general rela¬ 
tivity have all gained from this interdisciplinary ap¬ 
proach; as an example the concept of Generalized Sec¬ 
ond Law, and the Holographic Principle have also 
lead to results in QIT. In particular, upper bounds 
have been proven for the entropy outflow which 
is a proxy for the communication rate, or information 
channel capacity |36| . 


4 The renormalization group in¬ 
formation flow 


4.1 Description of the RG 

The main idea of the renormalization group (RG) is 
that of a tool to extract the macroscopic description 
of a physical system (e.g. a field) from its microscopic 
model. First of all, the change in the descriptions go¬ 
ing from the microscopic to the macroscopic model 
is captured by the change of the interaction constant 
g(p) in the interaction term of the hamiltonian. 

This change can be described as the action of an 
operator G applied to the interaction constant: 

g(p 2 ) = G[ g(p,i)} (38) 


where pt is a parameter that represents the different 
scales. Although this transformation is called “renor¬ 
malization group”, it is not formally a group. It is 
just a "flow of transformations" in the space of all the 
possible hamiltonians. The main reason why the RG 
is not a group, is that given a transformation from a 
small scale description to a large scale description, the 
inverse transformation is not necessarily defined. 

In 1954 Murray Gell-Mann and Francis Low pub¬ 
lished a work on quantum electrodynamics (QED) 
[ f5 . in which they studied the photon propagator at 
high energies. They introduced the concept of scal¬ 
ing transformation with a group-like formalism, where 
the group operator G transforms the electromagnetic 
coupling parameter g: 


G [g(p 2 )] 


(g)“ Cfo(«>] 


sM = G - 1 



G[g(pi)\ 


(39a) 

(39b) 


Equation (391 expresses the requirement that before 
and after the scaling, the physical laws don’t change. 
So the equation requires that the coupling parameter 
before and after the scaling changes taking into ac¬ 
count the scaling factor ■ Going from this dis¬ 


crete scaling pi — > p 2 to a continuous scaling transfor¬ 
mation, it is possible to define a function /3(g) that ex¬ 
presses the corresponding continuous transformation 
of the coupling parameter g: 


P \g(p)\ 


dg(g) 

dln(p)' 


(40) 


Between 1974 and 1975 Kenneth Wilson and John 
Kogut introduced a more general description of this 
idea mmm- In this description, the large scale 
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(macroscopic) behaviour will be linked to the low en¬ 
ergy regime of the model, because at long distance 
only long wavelengths are relevant, while for the mi¬ 
croscopic behaviour higher energies will be relevant. 
With reference to this, in the language of the RG the 
microscopic, high energy model will be called the ul¬ 
traviolet limit, while the macroscopic, low energy one 
will be called the infrared limit. Another language to 
express the description at different scales is in terms 
of fine graining and coarse graining. 

To give an example of the low energy approximation, 
we can imagine a sinusoidal potential for the micro¬ 
scopic model, and its approximation with a parabolic 
potential for the macroscopic description. This will be 
a good description at low energies, i.e. at the bottom of 
the microscopic potential. However, at high energies 
this approximation may introduce some divergencies, 
involving as an example the integration over bigger 
ranges of energies. Since those divergencies are only 
due to the approximated description of the potential, 
this can be corrected introducing a cut-off for the high 
range of energies. The dynamics of a composite system 



(a) (b) 


Figure 9: abstract description of the renormalization 
group, (a) Two different scales of modelling, with two 
different interacting constants, (b) An example of such 
different scales can be found in astrophysics, where the 
description at the scale of stars (lower image) has an 
interaction constant different from the description at 
the scale of galaxies (upper image). 


can be described by the interactions between its com¬ 
ponents. At a certain scale (graining) p ,i the physics 
of that model is described by the hamiltonian of the 
system, and in particular by its interaction term, i.e. 
by the interaction constant g{p\). At an bigger scale 
(coarse graining) /i 2 , the components of the lower scale 
can be “clustered” into a single element of the coarse 
graining (see section 9a I, and the interaction constant 
is in principle changed. The equations expressing the 
constrain that: “the physics at different scales has to 
be the same” are (38 1 and (391, which express the con¬ 
strains for the interaction constant g{iii), and another 
equation that express the constrain between the cor¬ 
relation at different scales, which is the the Callan- 


Symanzik equation USES ED; 


d o^ d 

m d^ +M g-g +n l 


C^ n \x 1 ,.. .,x n ;m,g) = 0 
(41) 

where: to is the mass, C is the correlation function be¬ 
tween the (xi,..., x n ) elements of the system, /? and 
7 are two functions that “compensate” the effect of 
the scale change, in order for the description (i.e. the 
correlation function) at the different scales to be con¬ 
sistent. In particular /3, which we have already seen 


in (401, captures the change of the coupling constant, 


while 7 captures the change of the field itself. 

In applying the group transformations, we go from 
one point of the space (manifold) of all the possible 
hamiltonians (i.e. in the manifold of the /3s and 7 s) to 
another. However, there are some points, called criti¬ 
cal points, or conformal points, where the function g(p) 
has its minimum. From another point of view we can 
think at the manifold of the hamiltonians (each de¬ 
scribing a different model for the system, at different 
scales, with different values of the coupling constant), 
and then think that the RG transformations describes 
a flow from one model to the other. The flow always 
ends at the points that are invariant for this transfor¬ 
mation, so those points have to be self-similar. Each of 
the critical points are characterized by the (minimal) 
value that the function assumes there, and this value 
is called the "central charge" of the system. 


4.2 The c-function and the link to QIT 

The c-theorem of Alexander Zamolodchikov m indi¬ 
viduates, in the case of a two-dimensional renormaliz- 
able field, a function which is monotonic along the RG 
transformations. 

This monotonicity suggests an information theoret¬ 
ical meaning for this function, analogue to the infor¬ 
mation content. IM1E11ES]- 

Since the seminal result by Zamolodchikov, sev¬ 
eral authors have worked on c-theorems at dimensions 
higher than 2 jotT :57j [55J W' ■ 

Another approach to the RG is the density ma¬ 
trix renormalization group (DMRG) {BUI ED ■ Osborne 
and Nielsen [621 make more explicit the link between 
DMRG and QIT. A characteristic feature of critical 
phenomena is the emergence of collective behaviour, 
and it is conjectured that quantum entanglement is the 
origin of this cooperative behaviour. DMRG and its 
explicit quanto-mechanical approach seems the ideal 
formalism with which to substantiate this conjecture 

[B51 [54). 

Finally, a different interdisciplinary approach, not 
necessarily linked to information theory, is the paral¬ 
lel between the renormalization used in quantum field 
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theory and the renormalization used in thermodynam¬ 
ics and statistical mechanics to describe critical phe¬ 
nomena [45] [65]. 
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